State-of-the-art convolutional neural networks are enormously costly in bothcompute and memory, demanding massively parallel GPUs for execution. Suchnetworks strain the computational capabilities and energy available to embeddedand mobile processing platforms, restricting their use in many importantapplications. In this paper, we push the boundaries of hardware-effective CNNdesign by proposing BCNN with Separable Filters (BCNNw/SF), which appliesSingular Value Decomposition (SVD) on BCNN kernels to further reducecomputational and storage complexity. To enable its implementation, we providea closed form of the gradient over SVD to calculate the exact gradient withrespect to every binarized weight in backward propagation. We verify BCNNw/SFon the MNIST, CIFAR-10, and SVHN datasets, and implement an accelerator forCIFAR-10 on FPGA hardware. Our BCNNw/SF accelerator realizes memory savings of17% and execution time reduction of 31.3% compared to BCNN with only minoraccuracy sacrifices.
展开▼